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ABSTRACT 



This Annual Report describes in detail the work performed during the 
first year of Task III of Contract NSF-CUlU and the present status of Task 
III work. 

The programs and achievements described constitute the first signifi- 
cant efforts to develop a user-oriented, cooperative program between major 
secondary scientific and technical information services— the Chemical 
Abstracts Service (CAS) information system and the National Library of 
Medicine's (NLM) MEDLARS — in conjunction with a large user of chemical and 
bio-medical information, the Food and Drug Administration (FDA). 

Experimental and developmental efforts have resulted in three new 
computer systems being instituted to produce the NIM Output Tape, the Desk- 
top Analysis Tools, and to determine and assign automatically MeSH Class 
Terms for MEDLARS. In addition, CAS has performed 59,698 registrations. 
These have contributed data on 21,110 substances that were new to the 
Chemical Abstracts Service files. 
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INTERFACE AND INTERACTION 



Chemical Abstracts Service (CAS) of the American Chemical Society is 
presently developing the Chemical Compound Registry System under Task I of 
Contract NSF-C^l 1 *. The purpose of this computer-based system is to identify 
uniquely chemical compounds and to, store and retrieve structural descriptions 
chemical nomenclature, and significant literature citations for these com- 
pounds. 

In contrast to the overall Task I objective of building a Registry 
System, Task III, which is supported by the National Science Foundation, 

The Food and Drug Administration (FDA) , and the National Library of Medi- 
cine (NIM), has as ius purpose the experimental operational interlinkage of 
the NLM MEDLARS and the CAS information systems, two of the world's largest 
secondary information services, and FDA, a large governmental administrative 
agency that routinely uses the secondary services to perform its mission. 

The work described in this report constitutes the first significant 
effort to develop a user-oriented, cooperative program between major 
secondary scientific and technical information services that are devoted to 
closely related subjects and that overlap significantly in the source 
documents covered by each. The project, should it leave the experimental 
stage and become fully operational, offers the prospect of greatly simpli- 
fied joint utilization of NLM and CAS information services by individuals 
and organizations which require regular access to both bio-medical and 
chemical information. Equally important, the successful conclusion of this 
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project offers potential operational savings both for the cooperating 
secondary services and their large users. 

Overall, NSF Contract Cklk was established to develop a Chemical Com- 
pound Registry System. The Registry System is conceived to be a computer- 
based recognition and file system that provides the bridge between atomic 
and molecular structural characteristics and the corresponding systematic 
and nonsystematic nomenclature that, appears across the entire range of 
scientific and technical literature. In addition, the system directly inter- 
links the data contained in it to the source documents in which the infor- 
mation was initially reported. 

The Registry System operates by assigning a unique number, called a 
Registry Number, to each substance when it is first entered into the file. 
Whenever a substance already registered appears in a new reference, the 
previously assigned number is automatically recovered. The System has 
three associated computer files — the Structure, Nomenclature, and Biblio- 
graphy Files — within each, the Registry Number functions as a machine address 
to tie together all information related to a given substance. 

Much progress has been made on Task III. During the first year, 59 >698 
registrations were performed.* This number represents the machine regis- 
terable substances from contract-specified sources. Still remaining are 
substances that must be manually registered because registration conventions 
have not been established as yet. 



*A registration is defined as *e,he process of determining the existence or 
nonexistence of a substance in the Registry File. 
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Also as a consequence of Task III, three new computer programs were 
developed. These included the routines required to produce the NLM Out- 
put Tape ( including the Combination Record ) , specialized Desktop Analysis 
Tools (DAT) for FDA and NLM, and to determine and assign automatically 
MeSH Class Terms for MEDLARS, all of which are discussed in this report. 

The work on this contract has progressed in great measure because of 
liaison efforts on the part of the parties involved. An example of such 
cooperation are the contributions made by personnel of each organization 
toward the technical upgrading of certain source lists to the level re- 
quired to process them into the Common Data Base. 

A similar example is the development of the MeSH Class Terra Assign- 
ment System described in this report. This System is based on an adapta- 
tion of the CAS Substructure Search System. It is expected that the 
developed system will continue to play a useful part in MEDLARS operations 
and that the joint NLM-CAS effort in defining substructures will be useful 
to MEDLARS users. It is interesting to note that the MeSH terms which are 
equivalent to structural fragments in themselves constitute a useful frag- 
ment code which may be of interest to MEDLARS users for their proprietary 
files . 

In addition to regular working sessions between FDA, NLM, and CAS 
staff, a technical training session was held to acquaint a group of FDA 
chemists with the CAS Chemical Compound Registry System. A description of 
this training session is given later in this report. 
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ESTABLISHING THE COMMON DATA BASE 



A substantial portion of the work performed under Task III of NSF-CUlU 
was the registration of chemical substances from the sources listed in 
the contract. The structural and nomenclature information concerning these 
compounds and the less-than- fully defined substances which together com- 
prise the Common Data Base are now for the first time available in one 
place in a computer-readable file. 

SOURCES OF COMMON DATA BASE 



During the first year of Contract NSF-C^l^, Task III (l July 1966 
through 30 June 1967 ) * 59*698 registrations were performed. Of these, 

21,110 resulted in new substances being added to the CAS Chemical Com- 
pound Registry System, while 38,588 substances matched compounds already 
on file. Of the sources inspected and analyzed, most were books and journal 
articles, including four reference works whose contents were registered 
under Task I of this contract, but are included as part of the Common Data 
Base. 

Table I gives a complete summary of the registration performed during 
the first year of this contract. 
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TABLE I 



SUMMARY OF REGISTRY FILE 
- 1 July 1966 through 30 June 1967 



SOURCE 




REGISTRj 


\TI0NS PER] 


FORMED 


Name 


Code 


New to 
File 


Matching 
Those on 
File 


Total 


Code of Federal Regulations^ 


CFR 


104l 


3042 


4083 


Colour Index 


Cl 


391b 


3024 


6998 


Common Names for Pesticides 


CNP 


b 


87 


91 


Dangerous Properties of 
Industrial Materials 


DPIM 


25 


74 


99 


Drug and Cosmetic Catalog 


DCC 


187 


1303 


1490 


Drug File (CAS internal file) 




3079 


6702 


9781 


Farm Chemicals Handbook 


FCH 


103 


454 


557 


Feed Additive Compendium 


FAC 


105 


154 


259 


Food Chemical?. Codex 


FCC 


95 


423 


518 


Guide to Chemicals Used 
in Crop Protection 


GCWCP 


31 


412 


443 


Handbook of Toxicology 


HT 


26b 


226 


490 


International Encyclopedia 
of Cosmetic Material 
Trade Names 


IECMTN 


2255 


2172 


4427 


International Non-Proprietary 
Names 


INN 


100 


907 


1007 


International Pharmacopeia 


IP 


60 


519 


579 


List of Colors, Appendix 


LC 


28 


90 


118 
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SOURCE 


REGISTRATIONS PERFORMED 


Name 


Code 


! 

New to 
File 


! Matching 
Those on 
File 


Total 


Merck Index, of Chemicals 
and Drugs ^ 


Merck 


8416 


12090 


20506 


MeSH Terms 


MeSH 


110 


810 


920 


Mycotoxins in Foodstuffs 


.MFS 


63 


82 


145 


The National Formulary 


NF 


164 


672 


836 


New Drugs 


ND 


30 


305 


335 


The Pharmacopeia of the 
United States of 
America 


USP 


71 


498 


569 


Perfumes, Cosmetics, and Soaps 


PCS 


323 


1390 


1713 


(2) 

Pesticide Index' c ‘ l 


PI 


154 


748 


902 


"Pesticides" from Chem. Week 


CW 


57 


332 


389 


South African Medical Journal 


SAMJ 


9 




15 


Summaries of Pesticide Toxicity 


SPI 


11 


98 


109 


United States Adopted Names ^ 


USAW 


89 


769 


858 


Veterinarians* Blue Book 


VBB 


262 


1199 


l46l 


TOTALS 




21110 


38588 


59698 



^ ^Substances registered only from selected and specified sections. See 



Appendix B. 

( 2 ) 

v Registered under Task I. 
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METHODS AND PROCEDURES 



Since the Common Data Base (CD3) File is an integral part of the Chemi- 

■ \ 

cal Compound Registry System, the registration process for CDB substances 
follows the procedures already established for the Registry System. Pro- 
cessing begins with a professional review of the sources from which substances 
are registered in order to select and code applicable nomenclature. These 
data are then clerically keyboarded via the Mohawk 1181 Data Recorder and 
processed through the computer-based, name— matching system. This system com- 
pares the names of compounds being registered (usually author-assigned or 
"trivial” names) against the names of compounds Oready registered. Each 
time an exact match is achieved, the formerly registered compound’s Registry 
Number and molecular formula are retrieved, and together with the input name 
are printed on Data Sheets for a chemist’s review. Following any corrections 
made by the chemist, the data is added to the master Registry files along 
with the appropriate source codes. 

Compounds for which there is no name match are registered by structure . 
Structure diagrams drawn and reviewed by a staff chemist are clerically key- 
boarded using the Mohawk 1101 Data Recorder or a chemical structure type- 
writer. Substances for which the structure is unknown and substances which 
are not now being automatically registered are assigned a Registry Number 
by a chemist. The compound's nomenclature , molecular formula, and source 
codes are associated with its Registry Number and input to the computer 
file. Figure 1 illustrates the v.sta flow for Task III substances within 
the CAS Chemical Compound Registry System. 

7 . • 
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Figure 1. DATA FLOW FOR TASK M SUBSTANCES 
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INPUT DATA 



For every compound registered under Task III, seven data items ere 
recorded, if they are available. Two, the Registry Number and Item Number 
(see Glossary., Appendix A for definitions) are normally assigned by the com- 



puter (except for compounds manually registered foi* which a chemist assigns 
the Registry Number). Input during registration are the l) structure, 2) 
molecular formula, 3) nomenclature, U) source code(s), 5) name- type code(s). 
Not all these are entered for each substance, for in certain cases the data 
may not be known. For example, there are many natural products for which no 
structure has been established. Under these circumstances, the substances 
would be manually registered without a structure, and in certain cases, with- 
out even a molecular formula. However, all other information concerning 
this product would he input and tied together by the Registry Number . 

The following is a brief description of each data item entered for Task 
III substances. 



Structures 

Structural descriptions, when available, are entered into the record as 
atom-bond connection tables. Devices such as a Mohawk 1101* or a structure 
typewriter are used for this purpose. In the process that utilizes the Mohawk 
1101, a clerk numbers each nonhydrogen atom in the structure diagram, then 
keyboards a connection table that lists each atom by number and indicates the 
atoms to which each atom is bonded and the type of bond involved. The computer 
then edits the table and automatically converts it into one that is compact, 
unambiguous, and unique. This latter table is compared with the master structure 
file and the previously assigned Registry Number is retrieved for compounds that 



*The Mohawk 1101 is a keypunch that records data directly on magnetic tape. 
30 June 1967 
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match. A Registry Number is assigned by the computer to each compound that 
is new to the file. The new table is then added to the master structure file. 

The second method for reg* iring structures utilizes the structure 
typewriter. Here a clerk copies the chemical structure directly using the 
typewriter, and the computer converts the structure to a connection table 

identical to the one that would have been generated by a connection table 
entry. 

Molecular Formula 

The molecular formula calculated from the connection table and recorded 
on the computer files is the modified-Hill format used in Chemical Abstracts 
(CA). Computer programs have been written to convert the modified-Hill format 
to the form desired by NLM, an inverted "NOPS sequence" molecular formula. In 
the latter form, nitrogen, oxygen, phosphorus, and sulfur (if these elements 
are present) are listed first, followed by an alphabetical listing of the 
elements excluding carbon and hydrogen, which appear last. For example, gly- 
cine is represented by NO C H ; the corresponding modified-Hill representa- 

2 2 5 

tion for this same compound is C^H ^.NO^. 

Nomenclature 

The Nomenclature File of the Common Data Base (CDB) is comprised of all 
the names contained in the CAS files for a particular substance. For retrieval 
purposes they are categorized and identified in the following manner: 

• CA Preferred Index Name (inverted form) 

• CA Ind' : Names (uninverted form) 

• Added CA Index Names 

• "Trivial Names" 

Names and synonyms are selected not only from the designated sources, but 

also from data already in the Registry Nomenclature File. When a compound 
30 June 1967 
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is registered, and a match found, synonyms already on record are included as 
part of the Common Data Base. 

The four specific codes used for the names incorporated into the file 
for Task III substances are detailed below: 

1^ CA_^eferred_Iridex_NSines are the systematxc nsmes used iu the 
Subject and Formula Indexes of CA. Each substance is assigned 
such a name according to the established CA nomenclature policy 
at the time. Although no formal plan exists to update these names 
once they have been assigned, through use, some limited updating 
has been made throughout the year. 

2. An Uninverted CA Index Name is the uninverted form of the CA 

Preferred Name. It is the verbal form and as such is the form 'usually 
recognized by chemists. For example, "hexachlorobenzene is the uh- 
inverted form of the CA Preferred Name, "benzene, hexachloro-. ” 

3- Added CA Index Names are special-purpose names that emphasize 
special structural features and are given in addition to the CA 
Preferred Index Name for a substance in the CA Subject Index. 

“Trivial" Names are names derived by nomenclature rules other than 
those used for the CA Preferred Index Name. In addition, "trivial" 
names include such designations as laboratory numbers and trademarked 
names. 

Registry Number 

The Registry Number is a unique computer— checkable number assigned to 
each substance when it is first entered into the file. Whenever a substance 
which is already in the file is registered, the previously assigned number 
is recovered automatically. The Registry Number functions as a machine address 
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within the files of the Registry System to link together all information 
about a given substance. 

Source Code(s) 

Source codes have been assigned to all of the sources from which the 
subst a nces are registered for entry into the Common Data Base. These codes 
are output as part of the data from the Registry System to facilitate the 
use of the system. 

Name-Type Code 

The Name Type identifies the type of name as follows: 

\ 

Code Type 

1 Preferred CA Index Name (inverted form) 

2 Added CA Index Name (inverted form) 

3 Author Name or Trivial Name, including 

Laboratory Numbers 

h Preferred CA Index Name (uninverted form) 

3R Registered Trade Mark 

M Unique MeSH Term 

Item 'Number 

The Item Number is an identification number used in conjunction with the 
Registry Number to provide a unique means of accessing data in the CAS 
Bibliography and Nomenclature Files. Each Item Number appears as a two -to - 
six character sequence , the last character being a check digit or letter 
used for automatic checking of the full series of digits obtained by combining 
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the Registry Number and the Item Number, As an example, the name Benzoic 
acid, 2,^-dihydroxy- is item 3 associated with Registry Number 8986I on the 
CAS Nomenclature Pile. The check digit for this item is 6, which is used to 
verify the entire sequence, 89861 3« The sequence of digits recorded is 
thus: 89861 36. 
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SYSTEMS DEVELOPMENT 



As previously discussed. Task III of contract NSF-CUlU is an experi- 
mental program directed toward the development of computer-based outputs that 
interface specifically with NLM and FDA information programs . To meet the 
requirements of this specialized orientation, CAS has developed and has 
placed into pilot plant operation the three new computer-based systems listed 
below'. 

• The NLM Output Tape System (including the Combination Record) 

• MeSH Class Term Assignment 

• Desktop Analysis Tool System 

The original development and the continuing improvement of these programs 
represent a substantial investment in time and money. The first two systems 
are described below. The Desktop Analysis Tool System is detailed in the 
section entitled "Output”. 

THE NLM OUTPUT TAPE SYSTEM 

The National Library of Medicine (NLM) requires that data on each 
registered substance include available names, molecular formula, MeSH Terms, 
and components (if the substance is a mixture) as well as Registry Number, 
source codes, etc. It also requires that no data for a substance be sent to 
them unless all the necessary data items are available. In addition, the 
taped information must be suitable for processing on their Honeywell 200 
computer. To meet these requirements , CAS has developed what is referred to 
as the ”NIM Tape Output System,” a computer system that detects, extracts. 
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holds, and reformats Common Data Base data to be forwarded to NIM. The 
■development of this system has required a great deal of communication between 
NLM and CAS to obtain output in the exact format required by NLM. Although 
the basic programs have been completed for some time, many modifications have 
had to be incorporated into the routines because of initial mutual misinter- 
pretations and misunderstandings concerning formats and design specifications. 
However, the liaison between NLM and CAS personnel has served to clarify these 
needs, and the NLM Tape Output System is well on its way to meeting NLM's 
needs. 

The total Tape Output System is comprised of two subsystems, the Pend- 
ing System and the Reformat Program. The former performs essentially a moni- 
toring function; that of holding data about a substance until all the required 
information concerning that substance is processed into the File and is avail- 
able as a unit for release. 

An illustration will serve to describe the operation of the Pending 
File: data for Substance X is keyboarded into the Bibliography File. The 

information is comprised of the Registry Number, name, and molecular formula, 
but no MeSH Term. This type of event frequently occurs because the various 
data items are input and processed independently of each other and follow 
separate processing routes to the Pending File. When the initial data are 
received by the File, the computer recognizes that not all the required infor- 
mation concerning Substance X is available and therefore the recorded data 
are held. When later the MeSH Term is added to the initial data, completing 
the record, the unit is released for further processing. 

Material released from the Pending File is next fed into the Reformat 
Program which performs two functions . The first is rearrangement whereby the 
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data such as names, item numbers, molecular formulas, and MeSH Terms are re- 
corded in the sequence desired by NLM, Second, the computer program converts 
the nine-track designation of the IBM 360 format into the seven-track format of 
the Honeywell computer used by NLM. If, for instance, the designation for 
the letter "A" in the 360 language is 01100100 and the designation for the 
3.etter "A", in the Honeywell 200 format is 001010, the program automatically 
changes the former into the latter. 

Once the Nltf Reformat Program has made the two changes described, the 
information is duplicated onto another magnetic tape which is sent to NLM. 

The original tape is retained by CAS until the next output tape is produced 
to prevent the information from being lost in transit. The information for 
each substance registered contains the four records mentioned above, namely, 
a name record, molecular formula record, MeSH Term record, and a component 
record. This latter record contains component Registry Numbers if the sub- 
stance is a mixture and the components are identified in the literature. If 
the components are not known such as in the case of certain "oils," the 
record is filled with zeros. 

In addition to the records described above, the NLM Tape contains also 
all the identification data such as item number and Registry Numbers that 
serve to tie all the information together. A detailed flow diagram of this 
system is shown in Figure 2. 

The Combination Record 

The Combination Record is that portion of the NLM Output Tape System 
that records on magnetic tape the Registry Numbers and cross references of 
mixtures and related components (See Figure 2). 
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NLM TAPE OUTPUT SYSTEM 

17. 



A mixture is defined as a physical combination of two or more components 
in which the latter’s ratios and identities may or may not he given. Generally, 
in the case of mixtures, registration is based on the individual corapone ts 
comprising the mixture; mixtures possessing identical components receive the 
same Registry Number. Each component (except for residues and generically 
described components such as tars, fluorides, and fatty acids) as well as 
the mixture itself is assigned a Registry Number. A component may be: l) a 

single substance such as a compound, or 2) a mixture which itself is composed 
of two or more components, or 3) & residue which is either stated or clearly 
implied. Residues identified only by the term "residues" are not assigned 
Registry Numbers, nor are such generic terms as "fluorides" or "fatty acids." 
However, these terms are considered in determining the Registry Number of 
mixtures. For example, if Mixture A is comprised of Components 1 and 2 and 
Mixture B of Components 1 and 2 plus a residue, they are assigned different 
Registry Numbers. 

Another illustration of how the above criteria operate can be shown by 
the registration rationale of mixtures of mixtures. If Mixture 1 is comprised 
of Compounds i , B, C, and D; and Mixture 2 is comprised of Components 1 and 2; 
and it is determined that Component 1 is a mixture of Compounds A and B and 
that Component 2 is a mixture of Compounds C and D, Mixtures 1 and 2 will be 
assigned different Registry Numbers since Mixture 1 is described as having 
four components and Mixture 2 is described as having two components. However, 
the File’s cross-referencing feature recognizes that Compounds A, B, C, and 
D are present in both Mixtures 1 and 2 and in Components 1 and 2 respectively, 
but it also recognizes that Components 1 and 2 are present in both of the 
mixtures. (See Figure 3*) 

Certain natural origin substances such as oils, concentrates, and juices 
represent a type of mixture that is not registered on the basis of components. 
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Because the descriptions of such substances vary substantially from one source 
to another due to a wide natural variation in composition, the registration 
of these substances is based on the name of the mixture including botanical 
or biological identification, if available, rather than the components. If 
components are known, they are assigned a Registry Number, and as succeeding 
sources list new components for the mixture, these are added to the record. 
However, only one Registry Number is assigned to the mixture itself. 

The Registry Number assigned to a mixture is comprised of the prefix 
MX and a seven-digit Registry Number drawn from the eight-million series, for 
example, MX8000008. The Registry Number for each component of a mixture is 
cross-referred to the Registry Number of its parent mixture. This feature 
identifies mixtures containing a given component as well as the components 
of a given mixture . 



MeSH CLASS TERM ASSIGNMENT 

MeSH Class Terms are generic index headings used in MEDLARS to identify 
substances containing common structural units . As part of contract NSF-CblU, 
Task III, approximately 500 MeSH Class Terms were defined by NLM and given 
to CAS. CAS is to assign the appropriate MeSH Class Term(s) to each substance 
in the Common Data Base for which there is an adequately defined structure on 
record. 

To effect these assignments, the CAS Substructure Search System is used 
to identify all compounds containing the requisite structural unit while 
another computer program makes the assignment based upon the substructure 
search. 

The Class Terms originally defined by NLM were, in general, refined by 
CAS (with NLM approval) to produce search profiles that accurately and 
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precisely defined the structure fragments. In this manner, the total 
specificity inherent in the Substructure Search System could be used to 
advantage. However, in the case of some 28 percent of the Class Terms, thij 
was not sufficient. After normal refinement and coding of the Terms, it 
became evident that a complete redefinition of the Terms was required. This 
was accomplished by the joint effort of both NLM and CAS staff and unques- 
tionably will result in an enhanced system. 

Returning to the general situation; after re f nement, profiles were 
coded for fragment search and, when necessary, for an iterative, atom-by-atcm, 
bond-by-bond search. An example of a term defined by HIM and refined by CAS 
for searching is as follows: 

HIM GAS 



Aniline Compounds 




Ho fused ring 
No C=0(S) on N 




If 



No atom attached to ring or N 
by a ring bond 
No N attached to N 
No =0 attached to N 
No C=0(s) attached to N 



In this example, the principle difference is "No =0 attached to N". This 

was added to preclude nitrobenzene derivatives from being retrieved as aniline 

compounds. 

Although all MeSH Class Terns are generic (by definition), some are 
broader than others. The various levels of specificity are related through 
a number of hierarchical series such as the ore illustrated below. These 
series were established principally by NLM, although additional members were 
added during the refinement of the Class Terms for substructure search. 
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Azoles 

Imidazoles 

Benzimidazoles 

Purines 

Adenines 

Adenine Nucleotides 
Guanines 
Hydantoins 
Pyrazoles 

The program created for Class Term assignment utilizes the hierarchical 
series in conjunction with substructure search results to assign the appropriate 
MeSH Class Terms in accordance with the NLM policy , of assigning the most 
specific applicable term to each chemical. For example, the structure for 
adenosine triphosphate will satisfy the substructure search requests for the 



following Class Terms: 



Azoles 

Imidazoles 

Purines 

Adenines 

Adenine Nucleotides 
Pyrimidines 



Phosphates 
Pyrophosphat es 
Furans 

Ethers, Cyclic 

Nucleotides 

Nucleosides 



However, applying the hierarchies after substructure search results in only 



the Class Terms "Adenine Nucleotides" and "Pyrophosphates" being assigned to 



the structure for adenosine triphosphate. 

For a detailed flow diagram of the complete MeSH Class Term Assignment 

System, see Figure U. 
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Figure 4. MeSH CLASS TERM ASSIGNMENT SYSTEM 
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OUTPUT 



Three types of output are required "by Contract NSF-C^l^. These are 
l) Desktop Analysis Tools (DAT), 2) Manual Structural File, and 3) the 
Composite File on computer tape. The Composite File for NIM, referred to 
as the NLM Tape Output System, has already been described in the previous 
section; details concerning the first two and the Composite File for FDA 
are presented below. 

DESKTOP ANALYSIS TOPIS 

The Task III Desktop Analysis Tool (DAT) is a computer-produced, printed 
compilation of all compound names contained in the Common Data Base. The DAT 
is a reference tool that links the Registry Number to the molecular formula 
and names of the chemical compound or substance. The three specialized DAT’s 
for Common Data Base compounds are: the Name Index in Alphabetical Order, 

the Name Index in Registry Number Order, and the Molecular Formula Index. 

The Systematic arrangement of data in each DAT facilitates the handling of 
chemical information for substances in the Common Data Base. 

CAS had previously developed a computerized compound data system under 
Task I of this contract, which we referred to as a DAT system. However, the 
requirements for this earlier system were less sophisticated than those of 
Task III. Therefore, it became necessary for CAS to develop a totally new DAT 
system to meet the needs of Task III. Two Task III DAT’s have been issued to 
date; the first in February, 19 67 , the second in May. The February DAT was 
published principally to establish the future format of each volume and secondaril 
to test the computer programs required to produce the publication. The May DAT 
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was the first full publication of this type of document for Task III. 

Below is a description of each section of the DAT. 

Name Index in Alphabetical Order 

This form of the DAT is arranged by the alphabetical order of names 
(except for laboratory numbers) beginning with the Roman letters constituting 
the main portion of the name. The following information is provided for each 
entry: 

Registry Number — the unique number given to each substance that 
identifies that substance throughout all CAS operations. 

Item Number— used in conjunction with the Registry Number to provide a 
unique means of accessing the items of data in the CAS Bibliographic File. 
Name Type - -identifies the type of name. 

Laboratory Numbers — appear in ascending numerical order ahead of the 
other names which are in alphabetical order. 

Molecular Formula — arranged in NOPS sequence. 

Source Abbreviation — is printed immediately after the molecular formula. 

Name Index in Registry Number Order 

The Name Index in Registry Number Order contains the same information 
as the Names Indexed in Alphabetical C~der. However, as the name implies, 
this DAT is ordered in ascending Registry Number order. This arrangement 
permits all names and synonyms associated with a compound or mixture to be 
grouped together. According to the sorting sequence used, alphabetically 
prefixed Registry Numbers sort ahead of regular Registry Numbers. Thus,, mix- 
tures appear first in this document. 

Molecular Formula Index 

This form of DAT is ordered according to the molecular formula arranged 
30 June 1967 
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in NOPS sequence. The arrangement of this particular volume is influenced by 
a format convention employed in all the DAT’s— the "dot-disconnected” format. 
Utilized principally to represent the structures and molecular formulas of 
metal salts of acids, acid salts of bases, quarternary ammonium salts, and 
addition compounds, this format represents these types of compounds as two or 
more individual structures separated by a dot. Upon ordering, the format causes 

the collection of closely related compounds in one place. An example of such a 
representation is O^C^H^'Na, for sodium acetate. The dot itself appears only 
on the printed output, not on the machine record, of the molecular foimula. 

In the ordering of the molecular formulas for the DAT’s, the portion of the 
"dot-disconnected” formula that contains the highest carbon count is given first. 

Other data given in this DAT include the Registry Humber, the Item Number, 
and the CA index names. Two types of names are listed, the Preferred CA Index 
Name and the Added CA Index Name. For a detailed flow diagram of the computer- 
ized DAT program, see Figure 5- Samples of the results of this program are 
shown in Figures 6-8. 



MANUAL STRUCTURE FILE 

Some 12,000 5 x 8-Inch cards comprising the first shipment of the Manual 
Structure File were delivered in May, 1967. Except for a graphic arts quality, 
hand-drawn structure, these cards are printed by computer (see Figure 9 ). In 
addition to the structu 3 .*e, the file contains the Registry Number, NOPS molecular 
formula, and CA Preferred Index Name on a large majority of the fully defined 
structures in the Common Data Base. Two sets of cards were produced, one was 
ordered by ascending Registry Number and the other by molecular formula. It 
is anticipated that approximately 8,000 additions will be made to this file 
during the remainder of the contract. 
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Figure 5. DESKTOP ANALYSIS TOOL SYSTEM 
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REGISTRY ITEM NAME 

NUMBER NUMBER TYPE NAME* MOLECULAR FORHULA. AND SOURCES 



2424 535 
542165 
183544 
538410 
5893958 
101133 
103322 
538512 
6078122 



5938045 
82906 
129737 
• 603407 
6373462 
6373473 
553275 
6150647 
328745 
106401 



11 

19 

CO 

v 

89 

16 

13 

69 

17 

19 



1A 

12 

17 

18 
14 
17 

13 
17 
7B 

14 



1 

1 

n. 

V 

1 

1 

1 

3 

1 

1 



1 

1 

1 

1 

1 

1 

1 

1 

3 

1 



Aniline, 

Aniline, 

A ** I 1 i n a 

nit * 4 • *♦ ** f 

Aniline, 
Anil ine, 
Aniline, 
Aniline, 
Anil ine, 
Aniline, 
NC, 3 H 



sulfate NC 4 H 7 »0 4 SH 2 MERCK, CA 

sulfate NC*H 7 «»/ 2 0 4 SH 2 
N-asetyl- NQC S H* CAS 
4,4*-azodi- N 4 C, 2 H| 2 CA 

4,4*-azodi— , hydrochloride N^CijHu'CIH 
3,3»-azoxydi- N 4 0C,*H, 2 Cl, CAS 
N-benzyl- NC l3 H l3 CAS 
N-benzylidene- NC t3 H tt CA,CAS 

N-benzyl idene-, compd. with 2,4,6-trinitroresoreinol (1:1) 

1 1 *n 3 o 4 c 4 h 3 



Aniline, N-bcnzyl idene- ,> picrate NCt 3 Hu*N 3 0 7 C 4 H 3 

Aniline, 4,4*— benzyl idenebi s[ N,N— diethyl— N 2 C 27 H 34 CAS 

Aniline, 4,4»-benzyl idenebi s[ N, N-dimethy 1- N 2 C 23 H 24 CA,CAS 

Aniline, 4,4*-benzyl idenedi- N 2 C,,H J4 CI,CA,CAS 

Aniline, p-( benzyl oxy)- N0C, 3 H, 3 CI,CA 

Aniline, 2-(benzyloxy )-5-chloro- N0C1C, 3 H, 2 Cl 

Aniline, N,N— bis( 2— chloroethyl )— NC1 2 C,qH, 3 CAS,CA 

Aniline, N,N-bis( 2-chloroethyl )-, hydrochloride NCl«C 3 oH| a*ClH 

Aniline, 3,5-bis( tri f luoromethyl )- NF 4 C 4 H* Cl 

Aniline, p-bromo- NBrC 4 H 4 CA,CAS 



97507 

6054519 

121879 

89634 

1635616 

2770118 

93674 

6259387 

6373508 

3282993 



16 1 
5E 3 
18 1 
15 1 

15 1 

15 1 

14 1 

18 1 
11 1 
1A 1 



Aniline, 5-chloro-2,4-dimethoxy- N0 2 C1C 4 H, 0 CA,CI 

Aniline, 3®-chloro-N, N-dimethy i-4, 4®-azodi- N 4 ClC t4 H| S Cl, CAS 

Aniline, 2-chloro-4-ni t ro- N 2 0 2 C1C 4 H S CI,CA 

Aniline, 4-chloro-2-nitro- N 2 0 2 C1C 4 H S CI,CA 

Aniline, 5-chloro-2-nitro- N*0 2 C1C 4 H $ CX,CA 

Aniline, o-(p-chlorophenoxy )- NOC1 C i 2 H i0 Cl 

Aniline, 5-chloro-2-phenoxy- N0ClC, 2 H, o Cl, CAS 

Aniline, 5-chloro-2-phenoxy-, hydrochloride N0C1 C i2 H io *C1H 

Aniline, p-cyclohexyl- NC, 2 H, 7 Cl 

Aniline, 4,4*-cyclohexyl idened i~ N 2 C, 4 H 22 CI,CA 



91736 


79 


3 


Aniline, 


N,N-dibenzyl- 


NC 2 oH } • 


CAS 


554007 


19 


1 


Aniline, 


2,4-diehloro- 


NC1 2 C 4 H 5 


CA,C I 


95829 


24 


1 


Ani line. 


2,5-dichloro- 


NC1 2 C 4 H s 


CA,CI 


95761 


5E 


1 


Aniline, 


3,4-diehloro- 


nci 2 c 4 h. 


CA,CI 


6471790 


15 


1 


Ani line. 


4,4®— (dichloromethylene)bis( N,N-diethyl- N 2 C1 2 C 2| H 2 * 


6483790 


18 


1 


Ani line. 


4,4»— (dichloromethylene)bis( N,N-di methyl— N 2 C1 ( C 17 H >0 


99309 


1A 


1 


Ani line. 


2, 6-dich loro-4 


-nitro- 1 


N*0 2 C1*C 4 H 4 CAS,CA,CI 


6388314 


18 


1 


Aniline, 


3,5-dichloro-2 


-phenoxy- 


N0C1 2 C i2 H 4 ci 


91667 


17 


1 


Aniline, 


N,N-diethyl- 


NC I0 H IS 


C A, Cl, CAS 


120229 


16 


1 


Anil ine. 


N,N-diethyl-p- 


ni troso- 


N 2 0C to H t# CA,CI 



2481949 

533700 

102567 

121697 

539173 

138896 

60117 

6280591 

97029 



13B 3 Aniline, N,N-d iethyl— p-( phenylazo )— N 3 C t4 H l4 Cl 

19 1 Aniline, 2,4-diiodo- NI 2 C 4 H 5 

19 1 Aniline, 2,5-d imethoxy- N0 2 C # H,, CA,CI 

14 1 Aniline, N, N-dimethy 1- NC*H, CA,CI 

125 3 Aniline, N,N-dimethyl-4,4»-azodi- N 4 C, 4 H I4 CAS, Cl 

12 1 Aniline, N,N-d imethyl-p-ni troso- N 2 OC 4 H 10 CAS,CA 

808 3 Aniline, N,N-d imethy l-p-( phenylazo)- N 3 C, 4 H ts 

12 2 Aniline, N,N-dimethyl-p-(phenylazo)-, coepd* with et henetetrecarbonitri le 

f 1 : 1 ) n 3 c } 4 H| s*n 4 c 4 ca 

17 1 Aniline, 2,4-dinitro- N 3 0 4 C 4 H. CA,CI 




FIGURE 6 - Sample Format of 
DAT - Name Index in 
Alphabetical Order 
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REGISTRY 

NUMBER. 

922554 


ITEM 

NUMBER 


NAME 

IIPE 

1 


MOLECULAR FORMULA. NAME, AND SOURCES 


12 


N 2 0*SC*H l2 

Alanine. 3,3'-th iod i-, L- CA 




67 


3 


Lanthionine. L- CA 




34 


3 


L-Lan t h ioni ne CI3 AC, MERCK 


922565 


15 


i 


N 2 0«SC«H t2 

Alanine* 3.3 1 — thioai— , meso- CA 




37 


3 


meso-Lanthi onine MERCK 




46 


3 


Mesolanthioni ne CA 


923068 






0*BrC 4 H s 


4 A 


3 


Bromosucc in i c acid CA, MERCK 




39 


3 


Monobromosuccinic acid MERCK 




17 


i 


Succinic acid, bromo— CA 


923320 


1A 


i 


n 2 o*s 2 c,h 12 

Cystine. DL— 




3C 


3 


DL-Cystine MERCK, CBAC 


924425 


12 


1 


no 2 c«h 7 

Acrylamide, N-( hydroxymethyl )- CA 




45 


3 


Methy lol acrylamide 




34 


3 


N-Methy lolacry lamide CA.CFR 


926034 


19 


1 


0 4 £C 2 H»* 1/ 2 Ca 

Calcium ethyl sulfate, Ca((Et0)S0 3 ] 




3B 


3 


Cslcium ethyls**! fate MERCK 




5D 


3 


Cal slum sul fovinate MERCK 


926261 


3A 


3 


o*c i2 h 22 

Di-ter t-but yl succinate MERCK 




18 


1 


Succinic acid, di-tert-butyl ester 




5C 


3 


Succinic acid di-tert-butyl ester 1 


928132 


18 


1 


0 2 C« 2 H ao 

Rhodov i ol asci n 


926S61 


3A 


3 


Spiril loxanthin CBAC, MERCK 


7D 


3 


OC^H, 2 

Blatteralkohol MERCK 




5B 


3* 


cis-3-Hexen-l-ol MERCK, CA 




17 


1 


3-Hexen-l-ol, cis- CA 




6C 


3 


Leaf alcohol CFR, MERCK 


929066 


5D 


3 


no 2 c*h 14 

Diglycolamine Cl, CAS 




19 


1 


Ethanol, 2-(2-aminoethoxy)- CA,CAS< 


929655 


33 


3 


NUC,H,7 

Caprylic aldehyde oxims MERCK 




11 


1 


Octanal, oxime CA 


929771 


5B 


3 


Behenic acid, methyl ester MERCK 




9F 


3 


Docosanoic acid, methyl ester CA 




17 


1 


Docosanoic acid, methyl aster 




7D 


3 


Methyl behenate CA 




6C 


3 


Methyl ester of behenic acid CA 


930029 


13 


1 


0C 2 oH«o 

Ether, octadecyl vinyl CA 




35 


3 


Vinyl stearyl ether IECNTN 



CA 

HERCK 




DAT - Name Index in Registry 
Number Order 
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MOLECULAR 

.fQEqyu- 


REGISTRY 

.miser. 


ITEM 

jasL 


N0*SC» t H», 


6259503 


36 




6259514 


39 


N 0 #SC||Hn 


57669 


13 


NO 4 SC % ] 


536958 


15 


N0,SC|,H|,. 


547353 


11 


N0*SC».H»j 


5905395 


12 


N 0 ,SC||H|| 


. 6357831 


48 


• 


6259570 


37 


N0,SC1C*H* 


88233 


18 


• 


5857943 


15 


N0,SC1C«H, 


6375617 


13 


N0 4 SClC,«K to ' 


6534298 


16 


N0,SC1|C,K, 


97085 


' 15 


N0,SC1,C*H, 


7064346 


19 


N0,SCl,C y H,«Na 


5698566 


12 


N0,SIC,H« 


547911 


19 


N0,SbC,H t o*Na 


138318 


13 


N0 4 SbC.Hto*0H««Nm 


6160232 


13 



CA INDEX NAME-5 

2-Naphthalenesul font c acid, 6-(dimethylemino)-4-« 
hydroxy- 

2-(J«phthaler.e«ulf onic acid; 6— ( athvlamino )-4-hyd m 
roxy- 

Benzoic acid, p-( dipropylsuifamoyl )- 

Benzoic «cid« p-«-toluenesulfonamido- 

Benzoic acid, p-(dibutyisulfamcyl)- 

7H~Benzo[ c Jcarbazole— 2-sul fonie acid, 4-hydroxy- 

1- N«pht ha lenesul fonlc acid, 5-hydroxy-4-p-toluld* 
ino- 

2- Naphthalenesalfonic acid, 4-hydroxy-7«' toluld* 
ino— 

Metanilic acid, 5-chloro~2-hydrrxy~ 

Hetanilic acid, 5-ehSoro-4-hydroxy- 

Acetic acid, [(4-chloro-2-ni trophenyl )thio >* 

Banzanasulfonic acid, p-( 2-ami no-4-ehlorophenoxy» 
)- 

Banzenasulfony i chloride, 4-chloro-3-ni <re- 

Metanilic acid, 2,5-dichloro-4**hydroxy- 

Benzoic acid, p-( dichlorosulfamoyl )», sodium en’»t 

S-Quinol inesulfonic acid, 8-hydroxy-7-i&do- 

Benzeneat ibonic acid, p— acetamido— , iQdUa salt 

Benzanest ibonic acid, p-acetamido-, 
sodium salt, hydrate 



NQfAsCftHio*Nft . 


• 


13 


140454 


N0$B rCyH% 




6 E 


10169503 


N0aC*H* 




6532769 


7E 


N 09 C 4 H 9 




126U4 


2B 




* 


15 


2211156 


N 0 |C|H| * 


533620 


7G 




5985239 


11 






5976858 


19 


NOftCiHt a ■ 




6209285 


18 



Arsanilic acid, N-y lycoloy!-, sodium salt 

Salicylic acid, 5-bromo-3-nitfo- 

Aspartic acid, 3-hydroxy- , BL-erythro- 

1, 3-Propanediol , 2-( hydroxymethyl )-2-nitro— 

Glutaric acid, 2-oxo-, oxime 

Glutamic acid, 3-hydroxy- 
Glutamic acid, 3-hydroxy-, 

Arabinose, c-xime, L— 

Pucose, oxime, D- 




Index 




ERiC 
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59325 N 3 C1C 16 H 20 

Pyridine f 2-[ ( p-chlorobenzyi )[2-( dimethylamino )ethyl ]amino]- 



|j ■$V NCH a CH a NMe a 

^ CH^CfcH^I-p 



FIGURE 9. Example of Manual Structure File Card 
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TAPE OUTPUT 



Contract requires CAS to develop the computer routines needed to 
convert the output of its own computer, an IBM 360, to a form compatible with 
the HIM computer, a Honeywell 200. The method being developed by CAS to meet 
this requirement was discussed previously. To acquire this capability has 
necessitated a very close liaison between personnel of both organizations. 

Seven test tapes have been generated by CAS and tested by HIM in an effort 
to establish the format and language required by the Honeywell computer. The 
seventh tape which was sent to HIM in June of this year has shown that the 
development of this format is near success. By making modifications to tape 
Ho. 7, it is expected that the next test tape will successfully perform its 
function. After these difficulties have been overcome, the Composite File will 
be put on computer tape and sent to HIM. See Appendix D for a brief description 
of the contents of each test tape. 

FDA COMPOSITE FILE 

The FDA Composite File is a compilation of the data contained in the 
Common Data Base. In this respect, it is identical to the HIM Tape Output 
previously described. However, a major difference exists between the two 
computer files. While the HIM file is to be in a format readable by a Honeywell 
200 computer, the FDA file will be readable by an IBM 360 computer, the computer 
for which CAS programs are being reprogrammed. 

At FDA's request, the delivery of the tapes, documentation and programs 
needed to process the Composite File has been delayed pending the installation 
of FDA*s computer- and the complete reprogramming of CAS* routines from the 
IBM 7010 computer to the IBM 360 computer. 
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FDA REGISTRY WORKSHOP 



FDA's evolving computerized information system will interface directly 
with the GAS Chemical Compound Registry System. For this reason, the FDA re- 
quested and CAS conducted a training session for a group of FDA personnel to 
familiarize them with the Registry System. 

The symposium was held at FDA training headquarters in Arlington, Virginia 
and was attended hy 1 ^ FDA chemists drawn from several of that organization's 
divisions and bureaus. The primary interests of these professionals lay in 



the following fields: 

• Drug analysis 

• Analytical chemistry 

• Food chemistry 

• Microbiology of antibiotics 



Pharmaceutical chemistry 

Biochemistry 

Toxicology 



Science information retrieval 



The CAS personnel who acted as instructors were the following: 

W. C. Davenport — Senior Staff Advisor 

M. K. Park— Head, Formula Indexing Department 

R. E. Stobaugh— Technical Advisor, Chemical Compound Registry 
Division 

R. W. White— Assistant to the Director, Research and Development Division 
Although the original purpose of the workshop was to present an overview 
of the Chemical Compound Registry System, the actual workshop resulted in a 
fairly comprehensive study of the procedures and conventions used in the 
Registry System, primarily due to the study's reception by FDA personnel and 
the quality of the instruction. 
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During the three-day symposium, many specific areas of the Chemical 
.Compound Registry System were discussed, including the following; 

1* Overview of CAS computer-base information system 
2. Confidentiality of data 
3* Overview of the Registry System 
!<■. General structure conventions 

5. Special structure conventions with particular emphasis upon stereo- 
chemistry 

6. Handling of nomenclature (names, DAT’s, name-match, cross-references) 

7* Methods of manual registration 
8, Substructure search methods 

During these discussions, the instructional materials outlined in Table II were 
used to aid in the recipient’s understanding of the subject matter and to act 
as reference material when the symposium ended. In addition to the presentations \ 
given by the CAS personnel, work sessions were held to give the FDA chemists I 

first-hand experience in some of the techniques used at CAS. Special emphasis | 

j 

was placed upon the use of DAT's. ] 

Communications received from FDA after the session indicated that the j 



workshop was very helpful to the FDA chemists and that their knowledge of the 
Registry System had been greatly enhanced by the three-day symposium. Interest 
was such that many of the attendants requested that additional training sessions 
be held. 

Many other meetings and conferences were held between personnel of the 
various organizations involved in Contract C4l4. A list of these, including 
a brief description of the topics discussed, is given in Table III. 
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TABLE II 



INSTRUCTIONAL MATERIAL FOR FDA WORKSHOP 



1. Registry System Description 

A very general description of the system sud tlie fvndsziiientsls , 
including the work flow and description of the Registry form. 

2. Registry System Stereochemistry 

Treatment of general and special text descriptors, including 
the list of permitted descriptors. 

3. Registry System Structure Conventions 

Conventions in general and for metal salts, addition compounds, 
salts of bases, quaternary compounds, carbohydrates, incompletely- 
described structures, metal coordination compounds, and inorganic 
compounds . 

4. Registry System Manual Registration 

General description, descriptions of mixture and proposed 
structure handling. 

5. The Naming and Indexing of Chemical Compounds from Chemical 
Abstracts . 

6. Registry Sheet forms. 

7 • Reprints of papers: 

a. F. Tate, Progress Toward A Computer-Based Chemical 

Information System , C & EN, January 23. 1967. ” 

b. D. Leiter, H. Morgan, and R. Stobaugh, Installation and 
Operation of a Registry For Chemical Compounds . J. Chem. Doc. 
Volume 5 No. 238 (1965) 

c. H. Morgan, Generation of Unique Machine Description For 
Chemical Structures , J. Chem. Doc. Volume 5 No. 107 (1965). 

d. D. Whittingham, F. Wetsel, and H. Morgan, Computer-Based 
Subject Index Support System . J. Chem. Doc. Volume 6 No. 4 
(1966). 
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SIGNIFICANT CONFERENCES HELD BETWEEN PARTIES 

OF CONTRACT NSF-CUlU, TASK III, FROM 1 JULY 1966 THROUGH 30 JUNE 1967 
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GLOSSARY 



CA - Chemical Abstracts 

CAS - Chemical Abstracts Service 

Combination - This is a mixture in which all the components are identi- 
fied and in which the ratio of components may or may not be specified. A 
Combination may also be defined only in terms of the "so-called" active com- 
ponents j and the ratio of the active components may of may not be given. 

Common Data Base - This is made up of the published data associated 
with the substances specified in Task III for processing under this con- 
tract. The data elements in the Common Data Base are: (l) whatever 

structural description of the substance is provided; (2) Inverted Molecu- 
lar Formula, when available; (3) Nomenclature; (4) Source Codes; (5) CA 
references for those synonyms taken only from the Registry files. 

Composite File - A magnetic tape file which includes the data associated 
with the substances in the Common Data Base and associated MeSH Class Terms. 
The File will be arranged in name order and will include for each substance 
the following data as they are available: 

1. Registry Number 

2. Nomenclature 

3* Name-Type Code 

4. CAS Item Number 

5. MeSH Terms 

6. Source Codes 

7 . 



Inverted Molecular Formula 








Compound - A single substance made up of identical molecular species. 
Connection Table - Atom-by-atom inventory of the molecule which shows 
each atom, the atoms connected directly to it, and types of linking bonds. 

Mass number, coordination number valence, and charges are shown whenever they 
are required for exact identification. Stereochemical data are included. 

Desktop Analysis Tool s - Printed lists of specially selected substances 
for use in the analysis of input information to help identify already regis- 
tered substances. These tools will include listings of names, laboratory numbers 
and acronyms with the associated Registry Numbers. Molecular formula indexes 
are also included. 

FDA - Food and Drug Administration 

Index Name - A name of a substance as used in the current Subject and 
Formula Indexes to Chemical Abstracts . The Preferred Index Name is used as 
the main heading in the Subject Index for index entries associated with the 
substance; this name is also used as the Formula Index entry for a compound. 

Added Index Names are special-purpose entries in the Subject Index which 

emphasize special structural features or identify useful author -derived nomen- 
clature. 

Inverted Molecular Formula - In this form of molecular formula, nitrogen, 
oxygen, phosphorus and sulfur are listed first in that order; these are followed 
alphabetically by symbols for elements other than carbon and hydrogen which 
come last. 

Item Number - A five-digit number that, used in conjunction with a 
Registry Number, identifies each item of data (e.g., each name, structure, 

molecular formula, etc.) associated with that Registry Number. 

" Med ica l literature Analysis and Retrieval System 






M^ SH Term - This is the Medical Subject Heading. In the context of this 
contract a MeSH Term is either an assigned specific name (Unique MeSH Term) 
for a substance or a generic characteristic (MeSH Class Heading) related to 
the substance to which the Class Term is assigned. A given substance may have 
both a Unique and a Class MeSH Term. 

Unique MeSH Term - Not all substances are represented by a 
Unique MeSH Term. The initial assignment of a Unique MeSH Tem is 
always carried out by the NIM staff. Once a substance has been 
assigned a Unique MeSH Term, this Terra is always used within the 
MEDLARS System as an index entry for the substance. Presently NLM 
has between 1,000 and 1,200 Unique MeSH Terms corresponding to 
substances. These Terms are a form of nomencla ure in CAS Registry 
files. 

MeSH Class Term - This is a generic heading representing a 
family of substances in MEDLARS. Certain Class Terms are based on 
substructural units, for example, phenothiazines, while others are 
based on biological activity, etc., for example, antibiotics. 

Molecular Formula - A listing of the type and total number of each atom 
present in a molecule. 

Name -Type Code - This is a numerical code attached to each entry in the 
Registry Nomenclature File. The codes identify: 

1. Preferred CA Index Name (inverted Form) 

2. Added CA Index Name 

3. Author or Trivial Name 

1|-. Preferred CA Index _-ame (Uninverted Form) 

NLM - National Library of Medicine 



Nomenclature - All names for substances including acronyms and laboratory 



numbers. 

NSF - National Science Foundation 

Organic Compounds - For the purposes of this contract these are carbon- 
containing compounds. 

Registration - The process of determining the existence or nonexistence 
of a substance in the Registry Files. The process includes the assignment of 
a unique number (Registry Number) to each substance that is new to the files; 
this number is to be used in a large, multifaceted system to associate data 
related to that substance. 

Registry Number - The unique number which is assigned to each substance 
when it first enters the Registry and which is recalled each time that sub- 
stance is checked against the file. The Registry Number may be used to identify 
fully the substance, and in the future it can be used as the address in special- 
ized subject files to identify data associated with the substance. A Registry 
Number may include alphabetic characters, and will include a computed check 
digit. 

Registry System - The interrelated set of files directly associated with 
registration and the processes (including manual and computer-based facets) 
for accomplishing registration. These computer files include any available 
structural record, the molecular form* (when available), nomenclature, and 
bibliographic data. 

Source Code - A set of codes attached to each name in the Registry 
Nomenclature File that identifies the source(s) in which the name is used, 
for example. Journal of Biological Chemistry . Chemical Abstracts , and 



Merck Index, or private sources. 



Structural Formula - A projected two-dimensional graphic representa- 



tion of the atoms and bonds of a molecule. 

Substructure - A specified set of atoms interconnected in 
way; this constellation normally represents less than a compiet 



specified 
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SOURCES OF THE COMMON DATA BASE 



This Appendix is an alphabetical listing of the books and other 
references from which data on Common Data Base substances were taken. 



♦Wilson, C. 0., Jones, T. E. : American Drug Index . Philadelphia: 

J. B. Lippincott Co. , 1966. 

Council of the Pharmaceutical Society of Great Britain: British Veterinary 

Codex . 1966 Edition. London: The Pharmaceutical Press. 1966. 

Gleason, M. N. , Gosselin, R. E. , Hodge, H. C. : Clinical Toxicology. of 

Commerical Products. 2nd Edition. . Baltimore: Williams and Wilkins Co., 

1963. Ingredients Index , pp. 1-126. 

♦Code of Federal Regulations * Title 21, Chapter 1, (Parts 1-129), Parts 
8 - 9 . MO, 120, i 2 i. 2 OO-i 2 i. 265 . Subject Index to Part 121 and selected 
portions relating to Part 120 issued since 1 January 1965 . Washington, 

D. C.: U. S. Government Printing Office. 1966. pp. 75-126, 327-367, 

UOO-M 9 . Only substances for which names appear as boldface entries pre- 
ceded by paragraph numbers were input. 

♦Code of Federal Regulations * Title 21, Chapter 1, (Parts 130-end), Parts 
130-133, 141a-e, l46a-e, lMa-x. Washington, D. C.: U. S. Government 

Printing Office. 1966. pp. 5-60, 72-223, 261-566, 577-728. Only sub- 
stances for which names appear as boldface entries preceded by paragraph 
numbers were input. 

The Society of Dyers and Colourists: Colour Index . 2nd Edition. London: 

Percy Lund, Humphries and Co., Ltd. 1956. Vol. 3; 1963 Suppl. , pp. S621- 
S1124. 

International Organization for Standardization: ISO Re commendations , Rll 6 , 

R219, R258, R290» "CninTnon Names for Pesticides ." Geneva: ISO. 1959, 1961, 

1962 * 1963. 

Sax, N. I.: Dangerous Properties of Industrial Materials . New York: 

Reinhold Publishing Corp. 1961. 

♦McCutcheon, J. W. : Detergents and Emulsifiers . 1966 Edition. Morristown, 

N. J. : John W. McCutcheon, Inc. 1966. 

♦Drug and Cosmetic Industry: Drug and Cosmet ic Catalog. 17th Edition. New 

York: Drug and Cosmetic Industry, 1966-67. 

Chemical Abstracts Service: Drug File (internal) 



^Source is an approved substitution of a newer edition for the one called 
for in the original contract. 



*Farm Chemicals: Farm Chemicals Handbook * 52nd Edition. Willoughby, Ohio: 

Meister Publishing Co. 1966. pp. D247-99* 

Animal Health Institute: Feed Additive Compendium . Minneapolis: Miller 

Publishing Co. 1966. Section 4, pp. 99-371* ( Only substances for which 

names appear as boldface entries were input.) 

National Academy of Sciences-National Research Council: Food Chemicals 

Codex . (NAS-NRC Publication No. 1143). Washington, D. C.: U. S. Govt. 

Printing Office. 1963. Parts I-X, pp. 1-722. 

Martin, H. : Guide to Chemicals Used in Crop Protection . 4th Edition. 

London, Ontario: Research Branch, Canada Dept, of Agriculture. 1961. 

Spencer, E. Y. : Guide to Chemicals Used in Crop Protection . Supplement 

to 4th Edition. London, Ontario: Research Branch, Canada Dept, of Agri- 

culture. 1964. 

Zimmerman, 0. T., Lavine, I.: Handbook of Material Trade Names . Dover, 

N. H.: Industrial Research Service, Inc. 1953: Supplement I. 1956; 

Supplement II. 1957; Supplement III. I960. 

Spector, W. C. (Editor): Handbook of Toxicology . Vol. 2, Antibiotics. 

Philadelphia: W. B. Saund-rs Co. 1957. 

Wessel, C. J. , Bejuki, W. M. : "Industrial Fungicides." Ind. Eng. Chem . 

Vol. 51(4), 52A-63A (April, 1959). 

de Navarre, Mai son G.: International Encyclopedia of Cosmetic Material 

T rade Names . New York: Moore Publishing Co., Inc. 1957. pp. 3-290. 

World Health Organization: International Non-proprietary Names . Cumula- 
tive List, 1962. Geneva: WHO. 1962. pp. 7- 49. 

World Health Organization: International Pharmacopeia . 1st Edition. Geneva 

WHO. 1951. Vol. 1. pp. 9-258. Vol. 2. 1955. pp. 3-217- Supplement 

1959* PP- 3-106. (Only substances for which names appear in the large, 
boldface headings were input. Entries differing only in physical state were 
not separately identified, for example, Sodium Phenobarbital and Sodium 
Phenobarbital Injection.) 

List of Colors Appendix pp. 44-6. 

Stecher, P. G. (Editor): The Merck Index of Chemicals and Drugs . 7th 

Edition. Rahway, N. J. : Merck and Co., Inc. i960, pp. 1-1121. 

The Merck Veterinary Manual : Rahway, N. J.: Merck and Co., Inc. 1955. 



^Source is an approved substitution of a newer edition for the one called 
for in the original contract. 



Goodhart, R. S. (Editor): Modern Drug Encyclopedia and Therapeutic Index . 

10th Edition. New York: Reuben H. Donnelley Corp. 1965* 

Modern Veterinary Practice . Red Book Edition. Vol. 47 (5)* (April 15, 
l&fi. pp. 195-2^3. 

Woggan, G. N. (Editor): Mycotoxins in Foodstuffs . Cambridge, Mass.: 

MIT Press. 1965. pp. 29, 34, 40, 4l, 59, 64, 83, 84, 118-121, 127, 140, 
177 , 266 - 272 . 

Committee on National Formulary: The National Formulary * 12th Edition. 

Washington, D. C. : American Pharmaceutical Association. 1965. pp. 10- 

428. (Only substances for which names appear in the large, boldface lad- 
ings were input. Entries differing only in physical state were not 
separately identified, for example , Sodium Phenobarbital and Sodium Pheno- 
barbetal Injection.) 

^American Medical Association. New Drugs : 1966 Edition. Chicago: AMA. 

1966. pp. 1-543. (Only substances for which names appear in the large, 
boldface headings were input . ) 

Poucher, W. A. Perfumes, Cosmetics and Soaps , Vol. 1. 6th Edition. Lon- 
don: Chapman and Hall Ltd. 1959. pp. 3-434. (Entries relating to species 

of the plant and animal kingdom were not input, however, extracts derived 
from plants and animals were input.) 

Frear, D. E. H. (Editor): Pesticide Index . 3rd Edition. State College, 

Penna. : College Science Publishers. 1965 . 

U. S. Pharmacopeial Convention, Inc.: The Pharmacopeia of the United States 

of America. 17th Revision. New York: U. S. Pharmacopeial Convention, Inc. 

1965. pp. 13-766. (Only substances for which names appear in the large, 
boldface headings were input. Entries differing only in physical state were 
not separately identified, for example. Sodium Phenobarbital and Sodium 
Phenobarbital Injection.) 

^Folsom, J. Paul (Editor) Physicians' Desk Reference 20th Edition. Oradell, 
N. J.: Medical Economics, Inc. 1966. pp. 189-273, 502-1092. 

Johnson, 0. H. , Krog, N. E., Poland, J. L.: "Pesticides, Part 1." Chem . 

Week , Vol. 92, pp. 128-48 (May 25, 1963). 

Johnson, 0. H. , Krog, N. E., Poland, J. L.: "Pesticides, Part 2." Chem . 

Week , Vol. 92, pp. 63-90 (June 1, 1963). 

South African Med. J . Vol. 39, PP* 762-4. 

Lehman: Summaries of Pesticide Toxicity. 1965. 



^Source is an approved substitution of a newer edition than the one called 
for in the original contract. 



*USAN Council: United States Adopted Names . lith Edition. New York: U.S. 

Pharmacopeial Convention, Inc. January, 1966. pp. 9-78. 

Unlisted Drugs. Vol. 1 (19^8) - Vol. 18 (l) (1966). New York: Special 

Libraries Association. 

U. S. Dept, of Agriculture, Pesticide Regulation Division: USDA Summary 

of Registered Agricultural Pesticide Chemical Uses . 2nd Edition, Supple- 
ment I. Washington, D. C. USDA, Agricultural Research Service. 1965. 

^Stephenson, H. C. (Editor): Veterinarians* Blue B o ok . lUth Edition. 

New York: Reuben H. Donnelley Corp. 1966. pp. 1-109 . 

Jones, L. M.: Veterinary Pharmacology and Therapeutics . 3rd Edition: 

Ames, Iowa; Iowa State Univ. Press. 1965* 



^Source is an approved substitution of a newer edition than the one called 
for in the original contract. 
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